| Date | Topic |
|---|---|
| 14.11.2024 | Data preparation and manipulation |
| 21.11.2024 | Basic statistics and data analysis with R |
| 21.11.2024 | Exercises/Workshop 4: Data gathering, data import |
| 28.11.2024 | Visualisation |
Lecture 7:
Data Preparation
2024-11-14
Source: https://www.storybench.org/wp-content/uploads/2017/05/tidyverse.png
| Date | Topic |
|---|---|
| 14.11.2024 | Data preparation and manipulation |
| 21.11.2024 | Basic statistics and data analysis with R |
| 21.11.2024 | Exercises/Workshop 4: Data gathering, data import |
| 28.11.2024 | Visualisation |
| Date | Topic |
|---|---|
| 05.12.2024 | Guest Lecture: Data Handling @Deloitte (Rachel Lund, Senior Economist) |
| 05.12.2024 | Exercises/Workshop 5: Data preparation and applied data analysis with R |
| 12.12.2024 | Analytics, more visualisation, and data products |
| 19.12.2024 | Summary, Wrap-up, Final workshop |
| 19.12.2024 | Exercises/Workshop 6: Visualization, dynamic documents |
| 19.12.2024 | Exam for Exchange Students |
Following R4DS, a tidy dataset is tidy when…
Tidy data. Source R4DS.
In Economics, the definition of an observation can vary:
Panel data tracks the same units over time: each unit has multiple observations across time periods.
Observation: a measurement for a specific unit at a particular point in time.
“Snapshot” of different units at the same moment.
Observation: single measurement for each unit at a single point in time.
Single unit tracked over time.
Observation: measurement of a single variable for a single unit (or aggregate) over multiple points in time.
# A tibble: 2 × 4
measure `Jan 1` `Jan 2` `Jan 3`
<chr> <dbl> <dbl> <dbl>
1 Temperature 20 22 21
2 Humidity 80 78 82
…
# A tibble: 2 × 4
measure `Jan 1` `Jan 2` `Jan 3`
<chr> <dbl> <dbl> <dbl>
1 Temperature 20 22 21
2 Humidity 80 78 82
# A tibble: 3 × 3
Date Temperature Humidity
<chr> <dbl> <dbl>
1 Jan 1 20 80
2 Jan 2 22 78
3 Jan 3 21 82
# A tibble: 3 × 2
year temperature_location
<dbl> <chr>
1 2019 22C_London
2 2019 18C_Paris
3 2019 25C_Rome
homework..
Student Econ DataHandling Management
1 Johannes 5.00 4.0 5.5
2 Hannah 5.25 4.5 6.0
3 Igor 4.00 5.0 6.0
homework..
Long and wide data. Source: Hugo Tavares
Rmelt(), gather(),👉 We’ll use tidyverse::pivot_longer().
cast(), spread(),👉 We’ll use tidyverse::pivot_wider().
Rrbind() in base R
bind_rows() from dplyr()
NAFor these reasons (+ performance, handling or row names, and handling of factors), dplyr::bind_rows() is preferred in most applications.
Long and wide data. Source: Hugo Tavares
Long and wide data with code. Source: Hugo Tavares